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Introduction 


The first question to ask when designing an assessment of reading and language skills is what predicts 
success in comprehending written language, that is, success in word reading and in reading 
comprehension? We are fortunate to have several consensus documents that review decades of 
literature about what predicts reading success (NRC, 1998; NICHD, 2000; NIFL, 2008; Rand, 2002; 
Rayner, Foorman, Perfetti, Pesetsky, & Seidenberg, 2001). 


Mastering the Alphabetic Principle 


What matters the most to success in reading words in an alphabetic orthography such as English is 
mastering the alphabetic principle, the insight that speech can be segmented into discrete units (i.e., 
phonemes) that map onto orthographic (i.e., graphemic) units (Ehri, Nunes, Willows, et al., 2001; Rayner 
et al., 2001). Oral language is acquired largely in a natural manner within a hearing/speaking 
community; however, written language is not acquired naturally because the graphemes and their 
relation to phonological units in speech are invented and must be taught by literate members of the 
community. The various writing systems (i.e., orthographies) of the world vary in the transparency of 
the sound-symbol relation. Among alphabetic orthographies, the Finnish orthography is highly 
transparent: phonemes in speech relate to graphemes in print (i.e., spelling) in a highly consistent one- 
to-one manner and graphemes in print relate to phonemes in speech (i.e., decoding) in a highly 
consistent one-to-one manner. Thus, learning to spell and read Finnish is relatively easy. English, 
however, is a more opaque orthography. Phonemes often relate to graphemes in an inconsistent 
manner and graphemes relate to phonemes in yet a different inconsistent manner. For example, if we 
hear the “long sound of a” we can think of words with many different vowel spellings, such as crate, 
brain, hay, they, maybe, eight, great, vein. \f we see the orthographic unit -ough, we may struggle with 
the various pronunciations of cough, tough, though, bough. The good news is that 69% of monosyllabic 
English words—those Anglo-Saxon words most used in beginning reading instruction—are consistent in 
their letter to pronunciation mapping (Ziegler, Stone, & Jacobs, 1997). Most of the rest can be learned 
with grapheme-phoneme correspondence rules (i.e., phonics), with only a small percentage of words 
being so irregular in their letter-sound relations that they should be taught as sight words (Ehri, Nunes, 
Stahl, & Willows, 2001; Foorman & Connor, 2011). 


In grades 3-12, alphabetic skills are measured with a word recognition task. In this computer-adaptive 
task, three words are presented on the computer monitor and students must select the word that best 
matches the word pronounced by the computer. About 10% of target words are nonsense words so that 
phonological decoding skills are tapped. When the target is a real word, distractors tap orthographic 
knowledge. For example, a distractor for “prerogative” might be perogative. By tapping orthographic 
knowledge in this task, the quality of a student’s lexical representation for a printed word is assessed. 
The more complete and accurate the lexical representation of a word is, the more efficient the student’s 
word recognition and reading comprehension (Perfetti & Stafura, 2014). 
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Comprehending Written Language (better known as Reading Comprehension) 


Knowledge of word meanings. Mastering the alphabetic principle is a necessary but not 
sufficient condition for understanding written text. We may be able to pronounce printed words, but if 
we don’t know their meaning our comprehension of the text is likely to be impeded. Hence, our 
knowledge of word meanings is crucial to comprehending what we read. Grasping the meaning of a 
word is more than knowing its definition in a particular passage. Knowing the meaning of a word means 
knowing its full lexical entry in a dictionary: pronunciation, spelling, multiple meanings in a variety of 
contexts, synonyms, antonyms, idiomatic use, related words, etymology, and morphological structure. 
For example, a dictionary entry for the word exacerbate says that it is a verb meaning: 1) to increase the 
severity, bitterness, or violence of (disease, ill feeling, etc.); aggravate or 2) to embitter the feelings of (a 
person); irritate; exasperate (e.g., foolish words that only exacerbated the quarrel). It comes from the 
Latin word exacerbatus (the past participle of exacerbare: to exasperate, provoke), equivalent to ex + 
acerbatus (acerbate). Synonyms are: intensify, inflame, worsen, embitter. Antonyms are: relieve, sooth, 
alleviate, assuage. \diomatic equivalents are: add fuel to the flame, fan the flames, feed the fire, or pour 
oil on the fire. The more a reader knows about the meaning of a word like exacerbate, the greater the 
lexical quality the reader has and the more likely the reader will be able to recognize the word quickly in 
text, with full comprehension of its meaning (Perfetti & Stafura, 2014). 


In the grades 3-12 FRA, knowledge of word meanings is measured by a Vocabulary Knowledge Task that 
taps morphological awareness. In the Vocabulary Knowledge Task, the student reads a sentence that 
has a missing word. The student selects among three words the one that best completes the sentence. 
The distractors and target vary in their morphological structure (i.e., prefixes or suffixes consisting of 
inflectional morphemes or derivational morphemes). It is relatively easy to read derived words that are 
pronounced similarly to their base (e.g., reason, reasonable). Words that contain a phonological shift 
(e.g., vine, vineyard) or an orthographic shift (e.g., pity, piteous) are harder to read, and words that 
contain both a phonological and an orthographic shift (e.g., theory, theoretical) are the hardest of all 
(Carlisle & Stone, 2005). The Vocabulary Knowledge Task in the FRA explained 2%-9% unique variance 
beyond prior reading comprehension, text reading efficiency, and spelling in predicting spring reading 
comprehension (Foorman, Petscher, & Bishop, 2012) and, by doing so, addresses aspects of language 
critical to understanding written language, language often called academic language because it is found 
in books and at school but not in informal conversations at home or outside school. Part of academic 
language is inferential language or decontextualized language, which allows speakers or writers to go 
beyond the present context and to predict, hypothesize, compare and contrast, and reason about 
events (e.g., an upcoming referendum) or abstract concepts (e.g., photosynthesis, gravity). Examples of 
words that signal such inferential or decontextualized language are describe, analyze, hypothesize. 


Syntactic awareness. In addition to understanding word meanings, another important aspect 
of academic language is syntactic awareness. Syntax or grammar refers to the rules that govern how 
words are ordered to make meaningful sentences. Children typically acquire these rules in their native 
language prior to formal schooling. However, learning to apply these rules to reading and writing is a 
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goal of formal schooling and takes years of instruction and practice. In the grades 3-12 FRA, there is a 
diagnostic task called Syntactic Knowledge Task (SKT). In this task the student listens to a sentence that 
is missing a word and selects the best word from a dropdown menu to complete the sentence. The 
words are verbs, pronouns, or connectives. Connectives are words that represent causal (e.g., because), 
temporal (e.g., when), logical (e.g., if-then), additive (e.g., in addition), or adversative (e.g., although) 
relations and are important linguistic devices for linking ideas and information within and across 
sentences. They link back to information already read through pronoun reference (anaphora) or 
repetition of nouns and verbs and provide clues to future meaning (e.g., therefore, nonetheless). 
Knowledge of the meaning and use of connectives is an important aid to comprehension (Cain & Nash, 
2011; Crosson & Lesaux, 2013). 


Reading comprehension. If a student can read and understand the meanings of printed words 
and sentences, then comprehending text should not be difficult, given the emphasis above on achieving 
the alphabetic principle, lexical quality, and syntactic awareness. Individual differences in readers’ 
background knowledge, motivation, and memory and attention will create variability in word 
recognition skills, vocabulary knowledge, and syntactic awareness and this variability, in turn, will create 
variability in reading comprehension. Furthermore, genre differences—informational or literary text— 
may interact with reader skills to affect reading comprehension. For example, some students may have 
better inferential language skills so critical to comprehending informational text; other students may 
have better narrative language skills of discerning story structure and character motivation and, 
therefore, be good comprehenders of literary text. Because reading comprehension is affected by the 
interactions of variables related to reader and text characteristics (RAND, 2002), tests of reading 
comprehension typically consist of informational and literary passages and provide as much relevant 
background information within the passage as possible. 


States’ reading comprehension tests typically have questions written to their state standards. One 
challenge for these tests are the trade-offs between coverage of the standards, time, and reliability. 
Typically, one should strive for about 15 items per standard. If a state has 14 standards per grade, then 
210 questions would be needed to reliably cover the standards. If 7-9 questions are written for each 
passage, then students would need to read 23-30 passages, which would take them about 10 days. Most 
states prioritize testing the superordinate standards in order to reduce the testing time to 7 passages or 
so over two days. A limitation of many standards-based tests is their sole focus on grade-level 
proficiency. Students are given only grade-level passages; therefore, students who read below grade 
level tend to guess and students who read above grade level are not challenged. In both cases, no 
information about their actual reading ability is obtained. Furthermore, when the grade level of 
passages is determined by readability formulae or by qualitative ratings, the precision is not at a 
particular grade but rather within grade bands of two to three grades (e.g., upper elementary, middle 
school, high school; Foorman, 2009; Nelson, Perfetti, Liben, & Liben, 2012). 


The FRA Reading Comprehension task in grades 3-12 avoids the problems with precision and efficiency 
noted above by being a computer-adaptive test. Students are placed into their first reading 
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comprehension passage based on their ability on the computer-adaptive Word Recognition and 
Vocabulary Knowledge Tasks—which take 2-3 minutes each. The student reads the passage and answers 
the 7-9 multiple choice questions. Subsequent passage placement is based on relations among student 
ability, standard error, and discrimination parameters from a 2-parameter logistic item response theory 
(IRT) model. Students continue to receive passages until a precise estimate of reading comprehension is 
achieved (i.e., reliability >.80). In the FRA, students receive 1-3 passages in about 10-30 minutes. Given 
that the two Screening tasks and one Diagnostic task take, on average, 11 minutes, the entire 3-12 
battery easily fits into a 45-minute class period. During the 2013-2014 implementation study in Pinellas 
County, reliability on the Reading Comprehension task was above .80 for 93 percent of students and 
above .90 for 54 percent of students. 


Individual tasks in the FRA yield two score types—percentile ranks and ability scores. The ability score is 
used to measure growth and can be displayed against grade-level percentile ranks to communicate the 
important point that students are improving across the year even though they are performing far below 
or above grade-level peers. 


Summary of FRA Constructs and Tasks 


The FRA consists of computer-adaptive reading comprehension and oral language screening tasks that 
provide measures to track growth over time, as well as a Probability of Literacy Success (PLS) linked to 
grade-level performance (i.e., the 50" percentile) on the reading comprehension subtest of the Stanford 
Achievement Test (SAT-10) in the 2014-2015 school year. Thus, the FRA provides universal screening and 
diagnostic tasks in a precise and efficient computer-adaptive framework with psychometrics and norms 
derived from large samples of Florida K-12 students representative of Florida demographics. The 
diversity of Florida’s demographics is a microcosm of the United States. By including Vocabulary 
Knowledge and Syntax Knowledge Tasks, the FRA has excellent construct coverage of oral language, 
which has been shown to account for the vast majority (i.e., 72%-96%, with a median of 87%) of 
individual differences in reading comprehension in grades 4-10 (Foorman, Koon, Petscher, Mitchell, & 
Truckenmiller, 2015) and comparable variance to decoding fluency in grades 1-2 (Foorman, Herrera, 
Petscher, Mitchell, & Truckenmiller, 2015). 


Description of the Tasks in the FRA 


Item development. Item development was broadly based on the empirical theories regarding 
reading development described above. Retention for specific items was based principally on the 
statistical properties of the items and is detailed in the Description of Method section. Items were 
originally written and reviewed by a team of experienced educators with advanced degrees in 
education, communication, and psychology. Item writers generally wrote to late elementary, middle, 
and high school students using vocabulary and text complexity that the writers had experienced in 
typical curricula and materials targeted to those age groups. Item writers created a variety of items that 
they considered to be easy, moderate, and difficult for the range of students. Writers were asked to 
provide a larger number of easier and moderate items. Given that screening assessments are more 
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commonly given to lower performing students and those students are assessed more frequently, the 
item bank needed to have a large number of easy and moderate items so that there were enough items 
in the item bank that students did not have to see the same items each year. Each item was reviewed 
by at least three other members of the review team for errors and appropriateness. All items in the 
Reading Comprehension task were aligned with a standard from the Common Core State Standards. 


Target words for the WRT and VKT tasks were based on pilot work with a small group of students and 
printed word frequency (Zeno, Ivens, Millard, & Duvvuri, 1995). A rough estimate of the range in 
difficulty of the sentences in the VKT and SKT tasks was obtained through use of the Flesch-Kincaid 
grade-level readability formula. 


Passages and items in the Reading Comprehension Task were written to address the ELA Common Core 
Standards in three strands (Reading Informational Text, Reading Literary Text, and Language). Items 
writers also reviewed publicly available examples from the Partnership for Assessment of Readiness for 
College and Careers and the SmarterBalanced Consortium. The range of text complexity of the passages 
was evaluated for a variety of freely available quantitative measures (i.e., Lexile, Flesch-Kincaid, Pearson 
Maturity Metric, Text Evaluator, ATOS, and Degrees of Reading Power) and the qualitative rating guide 
from Appendix A of the Common Core State Standards. The passages in elementary grades were 
originally written to be evenly split between literary and informational passages. The passage and item 
difficulty was ultimately determined by the normative sample’s performance on the task, so the 
resulting item bank is split 42% literary passages and 58% informational. Since the goal of this 
assessment is to cover the range of student ability as opposed to equally addressing all standards, the 
guidelines for item creation on the Reading Comprehension task was to make 30% of the items focused 
on vocabulary and 70% of the items focused on explicit and inferential comprehension questions. The 
comprehension items for elementary aged students were split evenly between explicit and implicit 
questions with the percentage favoring implicit questions at the upper grade levels. 


Word Recognition Task (WRT). In the Word Recognition Task, the student listens to a word 
pronounced by the computer. The computer monitor displays a drop-down menu with the correctly 
spelled word and two distractors that are spelled incorrectly. The student may replay the audio for the 
word up to three times. The student has unlimited time to respond to each item. The item bank contains 
274 available items and includes real words and some non-words. The range of possible theta scores in 
the WRT is -3.88 to 3.85. This range corresponds to an ability score range of 112 to 885. 


Vocabulary Knowledge Task (VKT). Each item in the Vocabulary Knowledge Task consists of one 
sentence with a word missing. The missing word is replaced with a choice of three morphologically 
related words. The student selects the word that best completes the sentence. There are 374 items 
available. The student has unlimited time to respond to each item. The range of possible theta scores in 
the VKT is -2.55 to 3.59. This range corresponds to an ability score range of 245 to 859. 


Reading Comprehension (RC). The Reading Comprehension task consists of passages that are 
between 200 and 1300 words in length. Each passage has between 7 and 9 multiple choice questions. 
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Each question has one correct response and three distractors. All questions associated with the passage 
are displayed at the same time and the passage is also available on the computer monitor. Each 
question has an individual item difficulty and discrimination value. Each set of 7 to 9 questions has an 
average item difficulty, which is used to determine which set of questions (and associated passage) is 
administered to the student next. The Reading Comprehension task ends when a reliable score has been 
reached (i.e., the standard error is less than 0.50) or the student has responded to three sets of 
questions. The initial set of questions administered to a student is determined by a formula that includes 
the student’s score on the WRT and the VKT. The computer will automatically log out students after 15 
minutes of inactivity; otherwise, students have an unlimited amount of time to read the passage and 
respond to questions. There are a total of 139 sets of questions associated with passages available in the 
grades 3-12 FRA. The range of possible theta scores in the RCT is -2.80 to 5.24. This range corresponds to 
an ability score range of 220 to 1024. 


Syntactic Knowledge Task (SKT). In the Syntactic Knowledge Task, the student listens to a 
sentence or sentences read by the computer that is missing one word. The computer monitor also 
displays the sentence(s) for the student to read along. The missing word(s) in the sentence(s) is replaced 
by a dropdown box with the correct word or phrase and two distractors. There are a total of 240 items 
available. Some items require a student to select the correct connective word, the correct pronoun 
reference, or the correct verb that creates appropriate subject-verb agreement. The range of possible 
theta scores in the SKT is -3.08 to 3.34. This range corresponds to an ability score range of 192 to 834. 


Task Administration. In grades 3 through 12, the FRA consists of four computer-adaptive tasks 
that each provide unique information regarding a student’s literacy skills. Each of the tasks below, 
except for Reading Comprehension, have four stop rules that determine when administration of each 
task is complete’. 


1. Areliable estimate of the student’s abilities is reached (i.e., standard error is less than 0.50). 
2. The student has responded to 30 items. 

3. The student responds correctly to all of the first 8 items. 

4. The student responds incorrectly to all of the first 8 items. 


At subsequent administrations of the tasks within the same school year, the student’s prior score on 
that task determines the initial set of items administered to the student at that administration period. 


1 The stop rules for reading comprehension are a maximum of three passages or a reliable estimate of 
the student’s ability (i.e., standard error < 0.50). 
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The tasks in the FRA can be used as a highly efficient diagnostic tool due to the utilization of computer 
adaptive functionality. Computer administration allows for large groups of students to be assessed at 
once with a high degree of standardization. Adaptability in the items allows for a highly reliable score to 
be reached sooner and decreases the amount of time needed for each task. Although educators are 
most concerned with students’ abilities in reading comprehension, it is a complex skill that takes 
significant amounts of time to assess (due to close reading of extended text) and poor performance 
does not necessarily signal which component skills of reading to target for instruction. The FRA 
efficiently assesses multiple research-based component skills of reading comprehension to help 
teachers diagnose skill weaknesses and target instruction. During the implementation study, more than 
98% of students reached a highly reliable score (marginal reliability above .80) by taking an average of 
only 20 items on the WRT, 9 items on the VKT, and 18 items on the SKT. Table 1 provides a description 
of the efficiency of each task. The increase in efficiency allows for more tasks to be administered to 
achieve a more complete diagnostic profile for a student. For example, in the implementation study 84% 
of students in grades 3 through 12 completed all four of the computer-adaptive tasks within one class 
period (i.e., 45 minutes). 
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Table 1 
Task Efficiency 
Word Vocabulary Syntactic Reading 
Recognition Knowledge Knowledge Comprehension 
Task Task Task Task 
Passages 
Number of items administered % students 
mean 20 9 17 1 passage 9.7% 
median 19 8 16 2 passages 22.7% 
administered 30 items 31% 2% 15% 3 passages 67.6% 
Reliability 
marginal reliability 
coefficient 0.93 0.91 0.93 0.94 
Cronbach's alpha > .9 82% 98% 87% 54% 
Cronbach's alpha > .8 98% 99% 99% 93% 
Time (minutes : seconds) 
mean 3:04 2:06 3:54 NA* 
median 2:36 1:40 3:30 NA* 
directions time 0:42 0:24 0:35 0:15 


*The mean and median values for amount of time spent on the Reading Comprehension Task are not 


available due to the nature of the task. 


FRA Introduction 


© 2014 Florida State University. All Rights Reserved. 


13 


Description of Method 


Item tryout and validation work with the above tasks occurred from 2010-2015 through the funding 
provided by two IES grants (see Acknowledgements). Once item writers had written items for each task, 
tasks were piloted with students in grades 3-12. Results from Item Response Theory (IRT) analyses were 
evaluated and in several cases items were deleted or more difficult items were written and further field 
trials were conducted. A large-scale linking study was conducted during the Spring of 2013 with 
approximately 45,000 students in grades 3 through grade 12 in two districts in Florida. Outcome data 
consisted of well-known standardized measures of reading comprehension (Gates-MacGinitie and the 
SAT-10). Item response and differential item function analyses were conducted. Parameters derived 
from these analyses are used in the look-up tables in the computer-adaptive system. 


Item Response Theory 


Data for the grades 3-12 FRA were analyzed using Item Response Theory (IRT). Traditional testing and 
analysis of items involves estimating the difficulty of the item (based on the percentage of respondents 
correctly answering the item) as well as discrimination (how well individual items relate to overall test 
performance). This falls into the realm of measurement known as classical test theory (CTT). While such 
practices are commonplace in assessment development, IRT holds several advantages over CTT. When 
using CTT, the difficulty of an item depends on the group of individuals on which the data were 
collected. This means that if a sample has more students that perform at an above-average level, the 
easier the items will appear; but if the sample has more below-average performers, the items will 
appear to be more difficult. Similarly, the more that students differ in their ability, the more likely the 
discrimination of the items will be high; the more that the students are similar in their ability, the lower 
the discrimination will be. One could correctly infer that scores from a CTT approach are entirely 
dependent on the makeup of the sample on which the items are tested. 


The benefits of IRT are such that: 1) the difficulty, discrimination, and pseudo-guessing parameters are 
not dependent on the group(s) from which they were initially estimated; 2) scores describing students’ 
ability are not related to the difficulty of the test; 3) shorter tests can be created that are more reliable 
than a longer test; and, 4) item statistics and the ability of students are reported on the same scale. 
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Item Difficulty. The difficulty of an item has traditionally been described for many tests as a “p- 
value”, which corresponds to the percent of respondents correctly answering an item. Values from this 
perspective range from 0% to 100% with high values indicating easier items and low values indicating 
hard items. Item difficulty in an IRT model does not represent proportion correct, but is rather 
represented as estimates along a continuum of -3.0 to +3.0. Figure 1 demonstrates a sample item 
characteristic curve which describes item properties from IRT. Along the x-axis is the ability of the 
individual, denoted by theta. As previously mentioned, the ability of students and item statistics are 
reported on the same scale. Thus, the x-axis is a simultaneous representation of student ability and item 
difficulty. Negative values along the x-axis will indicate that items are easier, while positive values 
describe harder items. Pertaining to students, negative values describe individuals who perform below 
average, while positive values identify students who perform above average. A value of zero for both 
students and items reflects average level of either ability or difficulty. 


Along the y-axis is the probability of a correct response, which varies across the level of difficulty. Item 
difficulty is defined as the value on the x-axis at which the probability of correctly endorsing the item is 
0.50. As demonstrated for the sample item in Figure 1, the difficulty of this item would be 0.0. Item 
characteristic curves are graphical representations generated for each item that allow the user to see 
how the probability of getting the item correct changes for different levels of the x-axis. Students with 
an ability of -3.0 would have an approximate 0.01 chance of getting the item correct, while students 
with an ability of 3.0 would have a nearly 99% chance of getting an item correct. 
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Figure 1. Sample Item Characteristic Curve 
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Item Discrimination. Item Discrimination is related to the relationship between how a student 
responds to an item and their subsequent performance on the rest of a test. In IRT it describes the 
extent to which an item can differentiate the probability of correctly endorsing an item across the range 
of ability (i.e., -3.0 to +3.0). Figure 2 provides an example of how discrimination operates in the IRT 
framework. For all three items presented in Figure 2, the difficulty has been held constant at 0.0, while 
the discriminations are variable. The dashed line (Item 1) shows an item with strong discrimination, the 
solid line (Item 2) represents an item with acceptable discrimination, and the dotted line (Item 3) is 
indicative of an item that does not discriminate. It is observed that for Item 3, regardless of the level of 
ability for a student, the probability of getting the item right is the same. Both high ability students and 
low ability students have the same chance of doing well on this item. Item 1 demonstrates that as the x- 
axis increases, the probability of getting the item correct changes as well. Notice that small changes 
between -1.0 and +1.0 on the x-axis result in large changes on the y-axis. This indicates that the item 
discriminates well among students, and that individuals with higher ability have a greater probability of 
getting the item correct. Item 2 shows that while an increase in ability produces an increase in the 
probability of a correct response, the increase is not as large as is observed for Item 1, and is thus a 
poorer discriminating item. 
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Figure 2. Sample Item Characteristic Curves with Varied Discriminations 


Guidelines for Retaining Items 


Several criteria were used to evaluate item validity. The first process was to identify items which 
demonstrated strong floor or ceiling effects in response rates >= 95%. Such items are not useful in 
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creating an item bank as there is little variability in whether students are successful on the item. In 
addition to evaluating the descriptive response rate, we estimated item-total correlations. Items with 
negative values are indicative of poor functioning such that it suggests individuals who correctly answer 
the question tend to have lower total scores. Similarly, items with low item-total correlations indicate 
the lack of a relation between item and total test performance. Items with correlations <.15 were 
flagged for removal. Following the descriptive analysis of item performance, difficulty and discrimination 
values from the IRT analyses were used to further identify items which were poorly functioning. Items 
were flagged for item revision if the item discrimination was negative or the item difficulty was greater 
than +4.0 or less than -4.0. 


Secondary criteria were used in evaluating the retained items, which was comprised of a differential 
item function (DIF) analysis. DIF refers to instances where individuals from different groups with the 
same level of underlying ability significantly differ in their probability to correctly endorse an item. 
Unchecked, items included in a test which demonstrate DIF will produce biased test results. For the FRA 
assessments, DIF testing was conducted comparing: Black-White students, Latino-White students, Black- 
Latino students, students eligible for Free or Reduced Priced Lunch (FRL) with students not receiving 
FRL, and English Language Learner to non-English Language Learner students. 


DIF testing was conducted with a multiple indicator multiple cause (MIMIC) analysis in Mplus (Muthén & 
Muthén, 2008); moreover, a series of four standardized and expected score effect size measures were 
generated using VisualDF software (Meade, 2010) to quantify various technical aspects of score 
differentiation between the gender groups. First, the signed item difference in the sample (SIDS) index 
was created, which describes the average unstandardized difference in expected scores between the 
groups. The second effect size calculated was the unsigned item difference in the sample (UIDS). This 
index can be utilized as supplementary to the SIDS. When the absolute value of the SIDS and UIDS values 
are equivalent, the differential functioning between groups is equivalent; however, when the absolute 
value of the UIDS is larger than SIDS, it provides evidence that the item characteristic curves for 
expected score differences cross, indicating that differences in the expected scores between groups 
change across the level of the latent ability score. The D-max index is reported as the maximum SIDS 
value in the sample, and may be interpreted as the greatest difference for any individual in the sample 
in the expected response. Lastly, an expected score standardized difference (ESSD) was generated, and 
was computed similar to a Cohen’s (1988) d statistic. As such, it is interpreted as a measure of standard 
deviation difference between the groups for the expected score response with values of .2 regarded as 
small, .5 as medium, and .8 as large. 


Linking Design & Item Response Analytic Framework 


A common-item, non-equivalent groups design was used for collecting data in our pilot, calibration, and 
validation studies. A strength of this approach is that it allows for linking multiple test forms via common 
items. For each task, a minimum of twenty-percent of the total items within a form were identified as 
vertical linking items to create a vertical scale. These items served a dual purpose of not only linking 
forms across grades to each other, but also linking forms within grades to each other. 
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Because the tasks in the FRA were each designed for vertical equating and scaling we considered two 
primary frameworks for estimating the item parameters: 1) a multiple-group IRT of all test forms or 2) 
test characteristic curve equating. We chose the latter approach using Stocking and Lord (1983) to place 
the items on a common scale. All item analyses were conducted using Mplus software (Muthén & 
Muthén, 2008) with a 2pl independent items model. Because the samples used for data collection did 
not strictly adhere to the state distribution of demographics (i.e., percent limited English proficiency, 
Black, White, Latino, and eligible for free/reduced lunch), sample weights according to student 
demographics were used to inform the item and student parameter scores. 


Norming Studies 


Students from several districts throughout Florida participated in the common-item, non-equivalent 
groups linking study to estimate and evaluate the item parameters and student ability score 
distributions for each of the computer adaptive tasks (CAT) in the FRA. A total of 44,780 students in 
grades 3-12 across six districts in Florida participated in the calibration and validation studies which 
consisted of students taking the FRA tasks appropriate to levels of performance. Table 2 provides a 
breakdown of the sample sizes used by grade level for each of the FRA adaptive assessments. Average 
demographic information for the state in grades 3-10 was as follows: 41% White, 30% Hispanic, 23% 
Black, 6% Other; 60% eligible for free/reduced price lunch; 8% limited English proficient2. Four percent 
of students were identified with a primary exceptionality as follows: Specific Learning Disabled (1.6%), 
Gifted (1.2%), Speech Impaired (.3%), Language Impaired (.2%), Emotionally Handicapped (.2%), Other 
Health Impaired (.2%), Autistic (.1%), Intellectual Disability (.1%), Orthopedically Impaired (<.1%), Deaf 
or Hard of Hearing (<.1%), and Visually Impaired (<.1%). Appendix H includes definitions for Florida’s ESE 
categories. 


The sample demographics for our validation sample approximately reflected state demographics as it 
pertains to the percent of White, Black, and Hispanic students, percentage of English language learners 
(ELL) and percentage of students eligible for free/reduced price lunch (FRL). A particular nuance with 
assessment research is that the collected sample data may not precisely reflect the population of 
interest. To correct for observed imprecision in how well a sample reflects a population, sample weights 
are used to reduce bias and compensate for over- or under- representativeness of the sample. 
Subsequently, our analyses were informed by weights constructed by evaluating the proportion of 
individuals who existed across combinations of race/ethnicity, ELL status, and FRL status. This resulted in 
16 unique weights applied to the data to account for the four levels of race/ethnicity (White, Black, 
Hispanic, Other), two levels of FRL status (eligible/not eligible), and two levels of ELL status (ELL/not 
ELL). In this way our analyses were able to more precisely reflect the distribution of Florida’s 


2 Data sources: Race data from 2013-14 Survey 3, Florida Department of Education; Free/Reduced Lunch data from 
2013-14 Survey 2 data, Florida Department of Education and Archive Data Core, Florida Center for Reading 
Research; English Language Learner data from Education Information and Accountability Services, Florida 
Department of Education and Archive Data Core, Florida Center for Reading Research. 
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demographics according to key demographic characteristics. Specific sample weight data used in this 
study are reported in Appendix A. 


Table 2 


Sample Size by Grade for FRA Tasks 


Vocabulary Word Syntactic Reading 
Grade Knowledge Recognition Knowledge Comprehension 


3 502 651 962 2,723 
4 570 586 857 2,679 
5 519 697 981 2,721 
6 606 652 865 3,835 
7 599 612 617 3,683 
8 597 613 616 3,814 
9 813 1,054 1,053 3,964 
10 574 1,109 869 3,787 
Total 4,780 5,974 6,820 27,206 


Score Definitions 


Several different kinds of scores are provided in order to facilitate a diverse set of educational decisions. 
In this section, we describe the types of scores provided for each measure, define each score, and 
indicate its primary utility within the decision making framework of the FRA. An ability score and a 
percentile rank are provided for each task (WRT, VKT, RC, and SKT) at each time point. One probability 
of literacy success score is provided at each assessment period. 


Probability of Literacy Success (PLS). The Probability of Literacy Success score indicates the 
likelihood that a student will reach end of year expectations in literacy. For the purposes of the FRA in 
the 2014-2015 school year, reaching expectations is defined as performing at or above the 50" 
percentile on the Stanford Achievement Test, Tenth Edition (SAT-10). The PLS is used to determine 
which students are at-risk for meeting grade level expectations by the end of the school year. In addition 
to providing a precise probability of reaching grade level outcomes, the PLS is color-coded: 
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¢ red =the student is at high risk and needs supplemental and/or intensive instruction targeted to 
the student’s skill weaknesses 


¢ yellow = the student may be at-risk and educators may consider differentiating instruction for 
the student and/or providing supplemental instruction 


¢ green = the student is likely not at-risk and will continue to benefit from strong universal 


instruction 


In the grades 3-12 FRA, the components that are included in the PLS are an aggregate of the individual 
student’s VKT, WRT, and RC scores. 


Percentile Ranks. Percentile ranks can vary from 1 to 99, and they divide the distribution of 
scores from a large standardization sample (in this case a representative sample of students from 
Florida) into 100 groups that contain approximately the same number of observations in each group. 
Thus, a sixth grade student who scored at the 60th percentile would have obtained a score better than 
about 60% of the students in the standardization sample. The median percentile rank on all the tests of 
the grades 3-12 FRA is 50, which means that half the students in the standardization sample obtained a 
score above that point, and half scored below it. The percentile rank is an ordinal variable meaning that 
it cannot be added, subtracted, used to create a mean score, or in any other way mathematically 
manipulated. The median is always used to describe the midpoint of a distribution of percentile ranks. 
Since this score compares a student’s performance to other students within a grade level, it is 
meaningful in determining the skill strengths and skill weaknesses for a student as compared to other 


students’ performance. 


Ability Scores. Each computer-adaptive task has an associated ability score. The ability score 
provides an estimate of a student’s development in a particular skill. This score is sensitive to changes in 
a student’s ability as skill levels increase or decrease. Ability scores in the grades 3-12 FRA span the 
development of each of four important skills: Word Recognition, Vocabulary Knowledge, Reading 
Comprehension, and Syntactic Knowledge. Each task’s vertical scale has a mean of 500 and standard 
deviation of 100. This score has an equal interval scale that can be added, subtracted, and used to create 
a mean score. Therefore, this is the score that should be used to determine the degree of growth ina 
skill for individual students. 
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Reliability 


Marginal Reliability 


Reliability describes how consistent test scores will be across multiple administrations over time, as well 
as how well one form of the test relates to another. Because the FRA uses Item Response Theory (IRT) as 
its method of validation, reliability takes on a different meaning than from a Classical Test Theory (CTT) 
perspective. The biggest difference between the two approaches is the assumption made about the 
measurement error related to the test scores. CTT treats the error variance as being the same for all 
scores, whereas the IRT view is that the level of error is dependent on the ability of the individual. As 
such, reliability in IRT becomes more about the level of precision of measurement across ability, and it 
may sometimes be difficult to summarize the precision of scores in IRT with a single number. Although it 
is often more useful to graphically represent the standard error across ability levels to gauge the range of 
abilities for which the test is more or less informative, it is possible to estimate a generic estimate of 
reliability known as marginal reliability (Sireci, Thissen, & Wainer, 1991) with: 


where ob is the variance of ability score for the normative sample and a2, is the mean-squared error. 
Marginal reliability coefficients for the four FRA Screening tasks are reported in Table 3 by grade and 
assessment period. 
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Table 3 


Marginal Reliability for FRA Screening Tasks of Vocabulary Knowledge, Word Recognition, Syntax and Reading Comprehension at the Fall, Winter, 
and Spring Administrations 


Vocabulary Knowledge Word Recognition Syntax Reading Comprehension 
Grade Fall Winter Spring Fall Winter Spring Fall Winter Spring Fall Winter Spring 
3 .84 .86 .87 .73 85 89 85 .87 .89 85 .86 83 
4 81 .83 .86 .86 .84 88 88 .87 .88 76 85 89 
5 .87 .87 .88 .87 .84 90 87 .88 90 .80 .83 90 
6 85 85 .86 .86 85 91 88 .89 91 84 .87 91 
7 85 85 .86 .86 .86 91 .88 .89 91 78 .83 91 
8 .83 .84 .84 .87 .83 92 91 .88 92 81 85 92 
9 85 .82 .86 .88 .80 91 91 .87 90 67 78 91 
10 85 81 .84 .88 78 90 8.91 .87 90 76 .82 92 
AllGrades = .91 .89 90 92 .88 93 93 92 93 86 .88 93 


Note. Reliability coefficients for the Fall and Winter Reading Comprehension scores are reflective of fixed item administrations. Spring reliability coefficients for Reading 
Comprehension are reflective of performance on the CAT version. Marginal reliability coefficients for Vocabulary and Word Recognition are reflective of CAT versions of the 
assessments. 
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Across all grades and assessment periods, the marginal reliability was quite high ranging from .86 fi 
reading comprehension to .93 for spring word recognition and reading comprehension. Values of .§ 
typically viewed as acceptable for research purposes while estimates at .90 or greater are acceptak 
clinical decision making (Nunnally & Berstein, 1994). Marginal reliability coefficients for the diagno: 
Syntactic Knowledge Task are reported in Table 4. Similar to the other tasks, marginal reliability 
coefficients were quite high across all grades ranging from .92 to .93. 


Standard Error of Measurement 


A standard error of measurement (SEM; Harvill, 2005) is an estimate that captures the amount of 
variance that might be observed in an individual student’s performance if they were tested repeate 
That is, on any particular day of testing, an examinee’s score may fluctuate and only through repea 
testing is it possible to get closer to one’s true ability. Because it is not reasonable to test a student 
enough to capture his/her true ability, we can construct an interval by which we can observe the e} 
to which the score may fluctuate. The SEM is calculated with: 


SEM = oy 1 — p? 


where o, is the standard deviation associated with the mean for assessment x, and p” is the margi) 
reliability for the assessment. Means and SEM are reported in Tables 4-6 for the 3 Screening tasks, 
respectively. 


Table 4 


Means and Standard Error of Measurement for Vocabulary Knowledge Scores 


Fall Winter Spring 
Grade N Mean SEM Mean SEM Mean SEM 
3 466 380.28 29.30 393.07 27.98 413.82 25.91 
4 486 431.77 28.42 439.80 28.63 453.59 26.85 
5 423 469.14 29.17 473.85 28.12 482.07 26.89 
6 639 492.40 29.23 498.09 29.17 505.10 27.05 
7 632 521.95 29.24 518.13 29.34 529.92 26.97 
8 681 550.11 29.60 540.88 30.88 551.98 29.40 
9 1014 555.66 29.40 560.26 32.00 562.86 28.62 
10 887 571.88 30.28 575.32 36.19 574.38 30.44 
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Table 5 


Means and Standard Error of Measurement for Word Recognition Scores 


Fall Winter Spring 
Grade N Mean SEM Mean SEM Mean SEM 
3 470 341.36 29.72 351.25 29.79 377.59 24.21 
4 491 407.69 31.06 405.81 30.43 427.49 29.73 
5 426 437.77 30.92 440.94 30.42 466.91 27.06 
6 646 465.32 31.28 458.53 31.06 490.20 26.41 
7 634 498.42 32.22 482.32 31.74 518.74 27.85 
8 690 531.50 32.88 515.55 36.63 555.32 27.06 
9 1017 543.01 33.21 543.53 43.68 567.72 29.29 
10 916 574.34 33.96 558.00 47.27 591.01 32.76 
Table 6 


Means and Standard Error of Measurement for Reading Comprehension Scores 


Spring 
Grade N Mean SEM 
3 325 386.03 28.69 
4 322 440.07 32.96 
5 302 497.25 36.49 
6 431 499.96 37.63 
7 426 524.45 39.67 
8 461 571.71 48.61 
9 703 583.06 39.26 
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24 


10 626 589.72 44.65 


Note. Data is only provided for Spring due to the CAT version only being administered in the Spring. 


Means and standard error of measurement for the diagnostic Syntactic Knowledge Task are reported in 
Table 7. 


Table 7 


Means and Standard Error of Measurement for Syntactic Knowledge Scores 


Fall Winter Spring 
Grade N Mean SEM Mean SEM Mean SEM 
3 377 328.84 30.80 358.06 30.58 402.12 25.29 
4 376 403.74 30.06 417.15 30.80 452.63 24.85 
5 340 430.52 30.12 452.58 30.82 483.09 25.29 
6 383 456.01 31.18 473.15 31.59 505.59 25.04 
7 396 510.01 30.40 504.94 31.41 529.24 25.49 
8 380 523.01 30.16 533.04 34.28 554.57 25.73 
9 457 554.38 32.05 551.09 36.27 571.61 27.52 
10 443 554.98 31.07 549.89 38.55 562.49 28.15 


Test-Retest Reliability 


The extent to which a sample of students performs consistently on the same assessment across multiple 
occasions is an indication of test-retest reliability. Reliability was estimated for students participating in 
the field testing of the FRA by correlating their ability scores across three assessments. Retest correlations 
for vocabulary and word recognition (Table 8) were the strongest between winter and spring while the 
fall-winter correlations were strongest for reading comprehension. Correlations between the fall and 
spring were the lowest, which is expected as a weaker correlation from the beginning of the year to the 
end suggests that students were differentially changing over time (i.e., lower ability students may have 
grown more over time compared to higher ability students). Retest correlations for the diagnostic 
Syntactic Knowledge Task are reported in Table 10. Similar to the Vocabulary Knowledge and Word 
Recognition Tasks, the strongest correlations between time-points were the winter-spring associations. 
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Table 8 


FRA Screening Test-Retest Correlations for Vocabulary Knowledge, Word Recognition, Syntax and Reading Comprehension 


Vocabulary Knowledge Word Recognition Syntax Reading Comprehension 


Fall- Winter- Fall- Fall- Winter- — Fall- Fall- Winter- Fall- Fall- Winter- Fall- 
Grade Winter Spring Spring Winter Spring Spring Winter Spring Spring Winter Spring Spring 


3 .59 .61 .44 .46 51 31 49 55 48 74 .66 .66 
4 58 .62 51 59 .62 45 .62 .70 56 .83 77 71 
5 75 74 .65 .63 73 .64 .68 75 .68 .83 77 73 
6 .60 72 51 59 .65 .66 .63 .69 .65 85 .80 77 
7 .66 .69 54 .65 .69 73 .68 74 .69 .80 79 73 
8 .63 .67 .63 .66 72 74 .66 76 70 81 79 71 
9 .65 .64 .65 .65 .68 76 70 73 .80 77 72 .65 
10 .62 70 .64 .69 .70 .80 .67 .70 72 75 74 .66 
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Validity 


Assessment of Model Fit 


A first step in testing the validity of scores was to evaluate the dimensionality of item responses on each 
of the FRA tasks. An important assumption in IRT is unidimensionality, which states that a score from a 
test can only have meaning if the items measure one dimension. Connected to this assumption is the 
framework of local item independence, which requires that, for a given level of individual ability, 
individual responses to a set of items are statistically independent of each other (Hattie, Krakowski, 
Rogers, & Swaminathan, 1996). McDonald (1979) suggested that a weaker principle of independence 
should be used, whereby only the covariances must be zero, and that the relationship between 
moments did not need to be considered. Stout (1990) extended the logic of weak local independence to 
argue for “essential unidimensionality” rather than ascribing to more stringent standards. Conceptually, 
Stout argued that a test is unidimensional if, for a given level of ability, the average covariance over pairs 
of items on the test is small in magnitude, as opposed to zero. Essential unidimensionality may be 
formally assessed through a variety of methods including parametric and non-parametric exploratory 
and confirmatory factor analysis. For the FRA tasks, a parametric confirmatory factor analysis was run on 
scores for different forms of each task by grade level. Because a planned missing data design was used, 
the covariance coverage was necessarily low. A planned missing data design with a large number of 
items frequently precludes a factor analysis of the full item response matrix when using the weighted 
least squares multivariate estimator. This estimator is necessary to produce commonly used fit indices 
for confirmatory factor analysis. Subsequently, the factor analysis was carried out by form and grade 
within each task. The comparative fit index (CFI), Tucker-Lewis index (TLI), and root mean square error 
of approximation (RMSEA) were used to evaluate model fit for the Vocabulary Knowledge, Word 
Recognition, and Syntax Knowledge tasks. CFI and TLI values of at least .90 are considered acceptable as 
are RMSEA values less than .10. For the Reading Comprehension task, we tested the extent to which a 
unidimensional model fit better than a testlet model. The two models were compared using the AIC and 
BIC indices. 


Fit statistics for Vocabulary Knowledge, Word Recognition, and Syntax Knowledge are reported in Tables 
9, 10, and 11, respectively. Results demonstrate that item responses across forms and grades converge 
on an essentially unidimensional construct for the three tasks. 
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Table 9 


Fit statistics by form and grade for the Vocabulary Knowledge Task 


Grade Form x df p-value RMSEA RMSEA LB RMSEA UB ~~ RMSEA p-value CFI TLI 
3 A 202.51 170 0.045 0.020 0.000 0.032 1.00 0.96 0.96 
B 175.65 152 0.092 0.019 0.000 0.031 1.00 0.97 0.96 

4 A 195.50 189 0.358 0.009 0.000 0.022 1.00 0.99 0.99 
B 214.65 189 0.097 0.017 0.000 0.027 1.00 0.97 0.97 

5 A 199.62 189 0.284 0.011 0.000 0.024 1.00 0.98 0.98 
B 169.92 170 0.487 0.000 0.000 0.022 1.00 1.00 1.00 

6 A 385.84 377 0.366 0.006 0.000 0.016 1.00 0.99 0.99 
B 441.40 377 0.012 0.017 0.008 0.023 1.00 0.96 0.96 

7 A 207.17 189 0.174 0.014 0.000 0.025 1.00 0.95 0.94 
B 219.36 189 0.064 0.018 0.000 0.028 1.00 0.98 0.98 

8 A 216.55 189 0.083 0.017 0.000 0.027 1.00 0.97 0.97 
B 228.64 189 0.026 0.021 0.008 0.029 1.00 0.94 0.93 

9 A 215.70 189 0.089 0.014 0.000 0.023 1.00 0.98 0.98 
B 225.72 189 0.035 0.017 0.005 0.002 1.00 0.96 0.96 

10 A 204.25 189 0.212 0.012 0.000 0.022 1.00 0.98 0.98 
B 232.27 170 0.001 0.028 0.018 0.037 1.00 0.89 0.88 


Note. df = degrees of freedom; RMSEA = root mean square error of approximation; LB = lower bound; UB = upper bound; CFI = comparative fit index; TLI = Tucker-Lewis index. 
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Table 10 


Fit statistics by grade and form for the Word Recognition Task 


Grade Form x df p-value RMSEA RMSEAUB  RMSEALB -~ RMSEA p-value CFI TLI 
3 A 233.54 152 0.000 0.042 0.031 0.052 0.91 0.93 0.92 
B 130.20 104 0.042 0.027 0.006 0.041 1.00 0.96 0.95 

4 A 99.27 65 0.004 0.044 0.025 0.061 0.71 0.90 0.87 
B 135.26 119 0.146 0.021 0.000 0.036 1.00 0.95 0.94 

5 A 173.02 152 0.117 0.020 0.000 0.030 1.00 0.96 0.95 
B 81.14 65 0.085 0.027 0.000 0.044 0.99 0.94 0.93 

6 A 478.14 377 0.000 0.020 0.014 0.026 1.00 0.93 0.93 
B 425.31 350 0.004 0.018 0.011 0.024 1.00 0.94 0.94 

7 A 189.75 152 0.020 0.029 0.012 0.041 1.00 0.90 0.89 
B 86.31 90 0.590 0.000 0.000 0.028 1.00 1.00 1.00 

8 A 179.94 152 0.060 0.025 0.000 0.038 1.00 0.91 0.90 
B 154.74 135 0.118 0.022 0.000 0.036 1.00 0.95 0.94 

9 A 198.25 152 0.007 0.024 0.013 0.032 1.00 0.96 0.95 
B 140.16 152 0.745 0.000 0.000 0.016 1.00 1.00 1.00 

10 A 196.33 152 0.009 0.025 0.013 0.034 1.00 0.92 0.91 
B 102.48 77 0.028 0.029 0.010 0.040 1.00 0.88 0.86 

C 404.31 377 0.159 0.017 0.000 0.029 1.00 0.95 0.94 


Note. df = degrees of freedom; RMSEA = root mean square error of approximation; LB = lower bound; UB = upper bound; CFI = comparative fit index; TLI = Tucker-Lewis index. 
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Table 11 


Fit statistics by grade and form for the Syntax Knowledge Task 


Grade Form x df p-value RMSEA RMSEAUB RMSEALB ~~ RMSEA p-value CFl TLI 
3 A 189.18 170 0.149 0.011 0.000 0.019 1.00 0.94 0.93 
B 198.78 152 0.007 0.018 0.010 0.024 1.00 0.96 0.96 

4 A 188.69 135 0.001 0.022 0.014 0.029 1.00 0.90 0.88 
B 167.71 152 0.182 0.011 0.000 0.020 1.00 0.97 0.97 

5 A 211.22 170 0.017 0.016 0.007 0.022 1.00 0.92 0.91 
B 177.81 152 0.075 0.013 0.000 0.021 1.00 0.97 0.96 

6 A 205.98 170 0.031 0.160 0.005 0.023 1.00 0.96 0.95 
B 293.34 230 0.003 0.018 0.011 0.024 1.00 0.95 0.94 

C 231.39 170 0.001 0.020 0.013 0.027 1.00 0.93 0.93 

7 A 160.33 170 0.691 0.000 0.000 0.015 1.00 1.00 1.00 
B 176.75 170 0.345 0.008 0.000 0.020 1.00 0.98 0.97 

8 A 304.36 170 0.000 0.036 0.029 0.042 1.00 0.82 0.80 
B 275.77 135 0.000 0.041 0.034 0.048 0.98 0.77 0.74 

9 A 184.00 170 0.219 0.009 0.000 0.017 1.00 0.99 0.99 
B 221.00 170 0.005 0.017 0.010 0.023 1.00 0.92 0.91 

10 A 199.47 170 0.061 0.014 0.000 0.022 1.00 0.93 0.93 
B 160.32 135 0.068 0.015 0.000 0.023 1.00 0.88 0.86 


Note. df = degrees of freedom; RMSEA = root mean square error of approximation; LB = lower bound; UB = upper bound; CFI = comparative fit index; TLI = Tucker-Lewis index. 
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Model fit comparisons between the unidimensional and testlet models for the Reading Comprehension 


Task are reported in Table 12. 


Table 12 


AIC and BIC values for the unidimensional and testlet models in Reading Comprehension by grade 


Grade Model AIC BIC adjusted-BIC 
3 Unidimensional 103845 106019 104851 
Testlet 103672 106928 105177 
4 Unidimensional 113842 115987 114830 
Testlet 113553 116765 115033 
5 Unidimensional 101720 130349 102539 
Testlet 101471 104130 102700 
6 Unidimensional 151414 153927 152649 
Testlet 150809 154579 152663 
7 Unidimensional 121206 123155 122158 
Testlet - - - 
8 Unidimensional 141907 144093 142981 
Testlet 141541 144820 143153 
9 Unidimensional 143848 146261 145041 
Testlet 143673 147293 145463 
10 Unidimensional 122108 124454 123259 
Testlet 121811 125330 123538 


Note. Grade 7 Testlet model did not converge. 


Results from this comparison based on AIC and BIC were mixed. The AIC suggests that the testlet model 


should be used while the BIC and adjusted BIC values were smaller for the unidimensional model. 


Although the indices provide mixed information, the penalty term is greater in the BIC compared to the 


AIC. Due to the penalty difference, the BIC is a more conservative estimate and given the results above it 


was deemed more appropriate for model selection. Subsequently, the unidimensional model was 
retained. 


Criterion Validity 


Criterion validity describes how well scores on one assessment relate to other theoretically relevant 


constructs, both concurrently and predictively. Concurrent validity was evaluated by correlating scores 


from the tasks amongst each other while predictive validity was evaluated by using the FRA tasks to 


predict later reading comprehension performance on the SAT-10. 
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Concurrent Validity 


Reading and language skills tend to have moderate associations between them; thus, the expectation of 
the FRA Vocabulary Knowledge, Word Recognition, and Syntactic Knowledge Tasks would be that 
stronger associations with reading comprehension would be observed compared to more moderate 
associations with each other. Correlation results are reported in Table 13. 


Table 13 


Bivariate Associations among FRA Tasks 


Grade Reading Word 
Comprehension Recognition 
Measure Vocabulary Syntax 

3 Reading Comprehension 1.00 

Vocabulary Knowledge .60 1.00 

Word Recognition 42 37 1.00 

Syntax Knowledge .48 38 30 1.00 
4 Reading Comprehension 1.00 

Vocabulary Knowledge 42 1.00 

Word Recognition .43 .30 1.00 

Syntax Knowledge 52 35 .29 1.00 
5 Reading Comprehension 1.00 

Vocabulary Knowledge 58 1.00 

Word Recognition .40 37 1.00 

Syntax Knowledge 57 44 31 1.00 
6 Reading Comprehension 1.00 

Vocabulary Knowledge 54 1.00 

Word Recognition 48 36 1.00 

Syntax Knowledge 58 45 36 1.00 
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7 Reading Comprehension 1.00 

Vocabulary Knowledge .46 1.00 

Word Recognition 45 38 1.00 

Syntax Knowledge .60 44 42 1.00 
8 Reading Comprehension 1.00 

Vocabulary Knowledge .49 1.00 

Word Recognition .49 .40 1.00 

Syntax Knowledge 59 44 .46 1.00 
9 Reading Comprehension 1.00 

Vocabulary Knowledge 53 1.00 

Word Recognition 55 53 1.00 

Syntax Knowledge .63 58 54 1.00 
10 Reading Comprehension 1.00 

Vocabulary Knowledge 50 1.00 

Word Recognition .49 51 1.00 

Syntax Knowledge 59 55 57 1.00 


Predictive Validity 


The predictive validity of the Screening tasks to the SAT-10 Reading Comprehension test for grades 3-12 
was addressed through a series of linear and logistic regressions. The linear regressions were run two 
ways. First, a correlation analysis was used to evaluate the strength of relations between each of the 
Screening tasks’ ability scores with the SAT-10. Second, a multiple regression was run to estimate the 
total amount of variance that the linear combination of the predictors explained in SAT-10 reading 
comprehension performance. Results from the linear regression analyses are reported in Table 14. 
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Table 14 


Bivariate Correlations between FRA Screening Tasks and SAT-10. Percent Variance Explained in SAT-10 by 
FRA Vocabulary, Word Recognition, and Reading Comprehension 


Vocabulary Word Reading 
Grade Knowledge Recognition Comprehension Total R? 
3 56 43 74 62 
4 45 39 vel 56 
5 57 Al 74 59 
6 53 A6 ay at 53 
7 43 43 .66 45 
8 46 A7 67 48 
9 51 55 .60 47 
10 47 at 57 39 


For the logistic regressions, students’ performance on the SAT-10 Reading Comprehension test was 
coded as ‘1’ for performance at or above the 50" percentile, and ‘0’ for scores below this target. This 
dichotomous variable was then regressed on a combination of vocabulary knowledge, word recognition, 
and reading comprehension scores at each grade level. Further, we evaluated the classification accuracy 
of scores from the FRA as it pertains to risk status on the SAT-10. By dichotomizing the combination of 
screening task scores as ‘1’ for not at-risk for reading difficulties and ‘0’ for at-risk for reading difficulties, 
students could be classified based on their dichotomized performances on both. As such, students could 
be identified as not at-risk on the combination of screening tasks and demonstrating grade level 
performance on the SAT-10 (i.e., specificity or true-negatives), at-risk on the combination of screening 
task scores and below grade level performance on the SAT-10 (i.e., sensitivity or true-positives), not at- 
risk based on the combination of screening task scores and not at grade level on the SAT-10 (i.e., false 
negative error), or at-risk on the combination of screening task scores and at grade level on the SAT-10 
(i.e., false positive error). Classification of students in these categories allows for the evaluation of cut- 
points on the combination of screening tasks (i.e., PLS) to determine which PLS cut-point maximizes 
predictive power 


The concept of risk can be viewed in many ways, including the concept as a “percent chance” which is a 
number between 0 and 100, with O meaning there is no chance that a student will develop a problem, 
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and 100 being there is no chance the student will not develop a problem. When attempting to identify 
children who are “at-risk” for poor performance on some type of future measure of reading 
achievement, this is typically a yes/no decision based upon a “cut-point” along a continuum of risk. 
Oftentimes this future measure of achievement is a state’s high-stakes assessment, which typically 
provides a standard score that describes the performance of each student. Grade-level cut-points are 
chosen that determine whether a student has passed or failed the state-wide assessment. 


Decisions concerning appropriate cut-points for screening measures are made based on the level of 
correct classification that is desired from the screening assessments. While a variety of statistics may be 
used to guide such choices (e.g., sensitivity, specificity, positive and negative predictive power; see 
Schatschneider, Petscher, & Williams, 2008), negative predictive power was utilized to develop the FRA 
cut-points. Negative predictive power is the percentage of students who are identified as “not at-risk” 
on the screening assessments that end up not passing based the outcome assessment. Predictive power 
is not considered to be a property of the screening assessments since it is known to fluctuate given the 
proportion of individuals who are at-risk on the selected outcome (Streiner, 2003). 


As it pertains to the FRA, we evaluated various cut-points on the PLS which would result in a minimum 
value of .85 negative predictive power for grades 3-10. Results from this analysis (Table 15), showed that 
a .70 PLS could be used to obtain the .85 negative predictive power threshold. 


Table 15 


Classification Accuracy of the Probability of Literacy Success (PLS) in Grades 3-12 using .85 and .70 Cut- 
Points 


Cut-Point Grade SE SP PPP NPP OCC Base Rate 
.70 3 85 .69 .66 .87 .76 41 
4 77 74 59 .88 75 .32 
5 .83 .76 65 .89 78 35 
6 92 56 .68 .87 .86 50 
7 91 .60 61 91 .73 .40 
8 85 .67 62 .88 74 39 
9 .76 .69 45 90 71 25 
10 .64 74 49 .84 71 .28 


Note. SE= Sensitivity, SP = Specificity, PPP = Positive Predictive Power, NPP = Negative Predictive Power, OCC = Overall Correct 
Classification. Students in Grades 11 and 12 are classified according to Grade 10 criteria. 
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Differential Accuracy of Prediction 


An additional component of checking the validity of cut-points and scores on the assessments involved 
testing differential accuracy of the regression equations across different demographic groups. This 
procedure involved a series of logistic regressions predicting success on the SAT-10 test (i.e., at or above 
the 50" percentile). The independent variables included a variable that represented whether students 
were identified as not at-risk (PLS = .70; coded as ‘1’) or at-risk (PLS < .70; coded as ‘0’) on the 
combination of screening task scores, a variable that represented a selected demographic group, as well 
as an interaction term between the two variables. A statistically significant interaction term would 
suggest that differential accuracy in predicting end-of-year performance existed for different groups of 
individuals based on the risk status determined by the screening assessment. For the combination of 
FRA screening task scores, differential accuracy was separately tested for Black and Latino students as 
well as for students identified as English Language Learners (ELL) and students who were eligible for 
Free/Reduced Price Lunch (FRL). 


When testing for differential accuracy between Black and White students (Table 16), a significant effect 
for the interaction between the PLS cut-point and minority status existed in grade 4 (p = .005). This 
finding indicated that for the sample tested at the winter assessment period, White students with a PLS 
above the cut-point had a greater chance of being at or above the 50" percentile on the SAT-10 
compared to Black students above the cut-point on the PLS. We note that replication will be needed 
across multiple administrations with a larger sample to evaluate the extent to which this phenomenon 
continues to exist. 


No significant differential accuracy between Hispanic and White students (Table 17), ELL and non-ELL 
students (Table 18), or students eligible for FRL and those who were not (Table 19). 


Table 16 


Differential Accuracy for FRA Screening Tasks by Grade: Black-White (BW) 


Grade Parameter df Estimate SE x2 p-value 
3 Intercept 1 0.15 0.28 0.30 0.580 
PLS 1 4.64 1.34 11.86 <.001 
BW 1 -0.60 0.34 3.06 0.080 
PLS *BW 1 -2.28 1.42 2.56 0.109 
4 Intercept 1 -0.60 0.34 3.05 0.080 
PLS 1 2.99 0.48 38.28 <.001 
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BW 0.45 0.41 1.19 0.274 
PLS *BW -1.67 0.60 7.61 0.005 
5 Intercept -0.37 0.26 1.99 0.157 
PLS 3.83 0.63 36.26 <.001 
BW -0.22 0.35 0.41 0.517 
PLS *BW -1.43 0.74 3.70 0.054 
6 Intercept -0.09 0.20 0.19 0.657 
PLS 2.82 0.46 36.31 <.001 
BW -0.88 0.32 7.21 0.007 
PLS *BW -0.27 0.80 0.11 0.735 
7 Intercept -0.39 0.22 3.23 0.071 
PLS 3.27 0.50 42.28 <.001 
BW -0.00 0.33 0.00 0.985 
PLS *BW -0.39 0.78 0.25 0.612 
8 Intercept -0.20 0.25 0.67 0.410 
PLS 2.42 0.43 31.30 <.001 
BW 0.11 0.38 0.08 0.771 
PLS *BW -1.04 0.70 2.24 0.134 
9 Intercept 0.24 0.24 0.94 0.329 
PLS 2.66 0.46 32.48 <.001 
BW -0.42 0.37 1.30 0.253 
PLS *BW -0.26 0.64 0.17 0.679 
10 Intercept 0.55 0.27 4.11 0.042 
PLS 1.82 0.37 24.04 <.001 
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BW 1 -0.79 0.37 


PLS *BW 1 0.48 0.56 


4.52 


0.74 


0.033 


0.387 


Note. PLS cut-off is .70. PLS scores are based on student performance at the winter administration. 


Table 17 


Differential Accuracy for Screening Tasks by Grade: Hispanic-White (HW) 


2 


Grade Parameter df Estimate SE x? p-value 
3 Intercept 1 0.15 0.28 0.30 0.580 
PLS 1 4.64 1.34 11.86 <.001 

HW 1 -0.79 0.31 6.32 0.011 
PLS*HW 1 -1.70 1.39 1.49 0.222 

4 Intercept 1 -0.60 0.34 3.05 0.080 
PLS 1 2.99 0.48 38.28 <.001 

HW 1 0.19 0.37 0.25 0.610 
PLS*HW 1 -0.46 0.55 0.69 0.405 

5 Intercept 1 -0.37 0.26 1.99 0.157 
PLS 1 3.83 0.63 36.26 <.001 

HW 1 -0.33 0.31 1.11 0.291 
PLS*HW 1 -0.86 0.69 1.53 0.215 

6 Intercept 1 -0.09 0.20 0.19 0.657 
PLS 1 2.82 0.46 36.31 <.001 

HW 1 -0.80 0.25 9.98 <.001 
PLS*HW 1 -0.08 0.58 0.02 0.886 

7 Intercept 1 -0.39 0.22 3.23 0.071 
PLS 1 3.27 0.50 42.28 <.001 


FRA | Validity 


() IN1A Stata af Elarida Nanartmant af Fduicatinn All Riahte Racarvad 


37 


HW 1 -0.15 0.27 0.30 0.577 


PLS*HW 1 -0.63 0.61 1.08 0.298 
8 Intercept 1 -0.20 0.25 0.67 0.410 
PLS 1 2.42 0.43 31.30 <.001 
HW 1 -0.47 0.31 2.29 0.129 
PLS*HW 1 0.77 0.60 1.60 0.205 
9 Intercept 1 0.24 0.24 0.94 0.329 
PLS 1 2.66 0.46 32.48 <.001 
HW 1 -0.02 0.31 0.00 0.934 
PLS*HW 1 -0.56 0.57 0.97 0.324 
10 Intercept 1 0.55 0.27 4.11 0.042 
PLS 1 1.82 0.37 24.04 <.001 
HW 1 -0.66 0.33 4.01 0.045 
PLS*HW 1 0.53 0.48 1.21 0.270 


Note. PLS cut-off is .70. PLS scores are based on student performance at the winter administration. 
Table 18 


Differential Accuracy for FRA Screening Tasks by Grade: English Language Learners (ELL) 


2 


Grade Parameter df Estimate SE x? p-value 
3 Intercept 1 -0.23 0.11 4.09 0.043 
PLS 1 2.74 0.27 102.81 <.001 

ELL 1 -1.18 0.30 14.91 <.001 

PLS*ELL 1 13.93 743.00 0.00 0.985 

4 Intercept 1 -0.10 0.14 0.58 0.445 
PLS 1 2.10 0.21 99.65 <.001 
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ELL -0.99 0.29 11.04 <.001 
PLS*ELL 0.22 0.88 0.06 0.802 
5 Intercept -0.49 0.14 12.17 0.000 
PLS 2.97 0.23 159.44 <.001 
ELL -0.42 0.26 2.63 0.104 
PLS*ELL -1.00 0.66 2.25 0.133 
6 Intercept -0.32 0.12 7.17 0.007 
PLS 2.61 0.26 95.81 <.001 
ELL -1.71 0.32 28.18 <.001 
PLS*ELL -0.58 0.83 0.50 0.478 
7 Intercept -0.13 0.12 1.10 0.293 
PLS 2.80 0.28 94.38 <.001 
ELL -1.51 0.31 23.75 <.001 
PLS*ELL -0.55 0.70 0.60 0.437 
8 Intercept 0.04 0.15 0.08 0.772 
PLS 2.23 0.27 65.49 <.001 
ELL -1.66 0.33 25.51 <.001 
PLS*ELL 0.57 1.03 0.30 0.582 
9 Intercept 0.19 0.14 1.85 0.173 
PLS 2.31 0.23 95.15 <.001 
ELL -0.44 0.35 1.61 0.204 
PLS*ELL -1.66 0.92 3.25 0.071 
10 Intercept 0.17 0.14 1.40 0.236 
PLS 2.21 0.21 105.95 <.001 


FRA | Validity 


() IN1A Stata af Elarida Nanartmant af Fduicatinn All Riahte Racarvad 


39 


ELL 1 -0.88 0.33 6.75 0.009 


PLS*ELL 1 -1.25 0.70 3.19 0.073 


Note. PLS cut-off is .70. PLS scores are based on student performance at the winter administration. 
Table 19 


Differential Accuracy for Screening Tasks by Grade: Free or Reduced Price Lunch (FRL) 


Grade Parameter df Estimate SE x2 p-value 
3 Intercept 1 0.83 0.32 6.47 0.011 
PLS 1 3.04 0.85 12.65 <.001 

FRL 1 -1.46 0.34 17.84 <.001 
PLS*FRL 1 -0.18 0.90 0.04 0.836 

4 Intercept 1 1.04 0.42 6.00 0.014 
PLS 1 1.54 0.54 7.93 0.004 

FRL a -1.55 0.44 12.26 <.001 
PLS*FRL 1 0.73 0.58 1.54 0.214 

5 Intercept 1 -0.12 0.35 0.12 0.721 
PLS a 2.55 0.47 28.86 <.001 

FRL 1 -0.55 0.37 2.15 0.141 
PLS*FRL 1 0.49 0.53 0.85 0.354 

6 Intercept 1 -0.31 0.24 1.65 0.197 
PLS 1 2.61 0.42 38.51 <.001 

FRL 1 -0.45 0.27 2.84 0.091 
PLS*FRL 1 0.14 0.52 0.07 0.780 

7 Intercept 1 0.12 0.23 0.29 0.588 
PLS 1 3.12 0.53 34.83 <.001 
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FRL -0.74 0.27 7.69 0.005 
PLS*FRL -0.50 0.60 0.70 0.401 
8 Intercept 0.14 0.28 0.26 0.606 
PLS 2.25 0.48 21.33 <.001 
FRL -0.70 0.32 4.72 0.029 
PLS*FRL 0.44 0.57 0.60 0.436 
9 Intercept 0.27 0.22 1.44 0.230 
PLS 2.54 0.37 46.89 <.001 
FRL -0.22 0.27 0.67 0.410 
PLS*FRL -0.44 0.46 0.91 0.338 
10 Intercept 0.02 0.20 0.01 0.890 
PLS 2.63 0.31 70.35 <.001 
FRL -0.04 0.26 0.02 0.863 
PLS*FRL -0.68 0.40 2.86 0.090 


Note. PLS cut-off is .70. PLS scores are based on student performance at the winter administration. 
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Construct validity describes how well scores from an assessment measure the construct it is intended to 


measure. Components of construct validity include convergent validity, which can be evaluated by 
testing relations between a developed assessment and another related assessment, and discriminant 
validity, which can be evaluated by correlating scores from a developed assessment with an unrelated 
assessment. The goal of the former is to yield a high association which indicates that the developed 


measure converges, or is empirically linked to, the intended construct. The goal of the latter is to yield a 


lower association, which indicates that the developed measure is unrelated to a particular construct of 


interest. 
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Convergent validity. Data was collected in two large school districts in central Florida with 
four elementary schools, three middle schools, and two high schools. A total of 1,825 students in grades 
3 through 10 were administered the four tasks in the FRA and gold standard clinical norm-referenced 
assessments of word reading (Test of Word Reading Efficiency — 2, Wagner, Torgesen, & Rashotte, 
2012), vocabulary (Peabody Picture Vocabulary Test — 4, Dunn & Dunn, 2007), and syntax (the 
Grammaticality Judgment Test of the Comprehensive Assessment of Spoken Language, Carrow- 
Woolfolk, 2008). 


Students’ abilities to derive word meanings receptively was measured by the VKT and the Peabody 
Picture Vocabulary Test-4 (PPVT-4; Dunn & Dunn, 2007). The PPVT-4 is used frequently as a normative 
measure and as a diagnostic. The PPVT-4 requires students to point to a picture, from a group of four 
pictures, which best represents a word spoken by the examiner. The PPVT-4 manual reports high 
reliability, with internal consistency reliability ranging from .92 to .98. The PPVT-4 also demonstrates 
high convergent validity to other measures, with correlations ranging from .80 to .83 with the Expressive 
Vocabulary Test (Williams, 2007) and correlations with the Clinical Evaluation of Language Fundamentals 
(Semel, Wiig, & Secord, 2003) ranging from .67 to .79. 


Students’ abilities to use the structure of sentences to comprehend the sentences’ meaning was 
measured by the SKT and the Grammaticality Judgment subtest (GJT) of the Comprehensive Assessment 
of Spoken Language (CASL; Carrow-Woolfolk, 2008). The CASL is most frequently used by speech 
language pathologists to determine instructional/therapy goals for students with diagnostic weaknesses 
in language skills such as syntax. In the GJT, students were orally presented sentences with and without 
grammatical errors and asked indicate whether or not there were errors. The items have an additional 
component asking students to fix any perceived errors in the sentence without changing its meaning. 
The GJT subtest has high internal consistency reliability ranging from .85 to .94 and high criterion- 
related validity with other oral language assessments within the CASL. The manual reports that, after 
correcting for variability between norm groups, the GJT correlates to the Listening Comprehension and 
Oral Expression Scales (Carrow-Woolfolk, 1995) Oral Composite score at .75. 


Word recognition was measured by the WRT and compared to performance of a measure of decoding 
fluency, the Sight Word Efficiency and Phonemic Decoding Efficiency subtests of the Test of Word 
Reading Efficiency-2 (TOWRE-2; Wagner, Torgesen, & Rashotte, 2012). The TOWRE-2 was designed to 
monitor the progress of students receiving additional instruction for weaknesses in word reading 
abilities and has demonstrated discrimination between low-performing students with language and 
reading disabilities (Wagner, Togesen, & Rashotte, 2012). When administering this assessment, the 
examiner asks students to read nonwords and sight words aloud as quickly as possible within 45 
seconds. The alternate-forms reliability coefficient ranges from .82-.94 and average test-retest 
coefficients amongst forms exceeds .90. Correlations with other measures of word reading is high, such 
as the Letter-Word Identification subtest of the Woodcock-Johnson III (r = .76; Woodcock, McGrew, & 
Mather, 2001), reading fluency (r = .91) on the Gray Oral Reading Test-4th ed. (GORT-4; Wiederholt & 
Bryant, 2001), Test of Silent Contextual Reading Fluency (TOSCRF; Hammill, Wiederholt, & Allen, 2006; r 
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= .75), and the Woodcock Reading Mastery Test—Revised (WRMT-R; Woodcock, 1987) Passage 


Comprehension (r = .88). 


Relations between the FRA Reading Comprehension Task and the SAT-10 Reading Comprehension are 
found in Table 14. Correlations in Table 20 demonstrate moderate associations exist between the FRA 
Vocabulary Knowledge Task and the PPVT-IV. The average correlation across grade levels is .52 with a 
range of .47 to .67. Correlations between the FRA Word Recognition Task and the TOWRE Real Word 
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component of the TOWRE demonstrated moderate associations as well. The average correlation across 


grade levels is .33 with a range of .24 to .49. Correlations between the FRA Word Recognition Task and 


the TOWRE Non-Word component of the TOWRE were moderate. The average correlation across grade 


levels was .38 with a range of .30 to .47. Correlations between the FRA Syntax Knowledge Task and the 


GJT were moderate. The average correlation across grade levels was .49, with a range of .37 to .61. 


Table 20 


Correlations between FRA scores and the PPVT-IV, GJT, and TOWRE 


Grade N FRA Task PPVT-IV GJT TOWRE TOWRE 
Real Word Non-Word 
3 251 ~Vocabulary Knowledge 0.47 0.40 0.37 0.29 
Syntax Knowledge 0.54 0.49 0.34 0.28 
Word Recognition 0.27 0.31 0.42 0.43 
4 161 Vocabulary Knowledge 0.56 0.57 0.50 0.44 
Syntax Knowledge 0.60 0.61 0.35 0.33 
Word Recognition 0.36 0.40 0.45 0.45 
5 167 Vocabulary Knowledge 0.61 0.51 0.35 0.39 
Syntax Knowledge 0.56 0.47 0.33 0.32 
Word Recognition 0.22 0.10 0.24 0.30 
6 113 Vocabulary Knowledge 0.62 0.53 0.41 0.44 
Syntax Knowledge 0.52 0.44 0.20 0.20 
Word Recognition 0.36 0.26 0.49 0.47 
7 72 Vocabulary Knowledge 0.58 0.50 0.43 0.33 
Syntax Knowledge 0.50 0.49 0.30 0.28 
Word Recognition 0.34 0.31 0.46 0.51 
8 71 Vocabulary Knowledge 0.50 0.53 0.36 0.45 
Syntax Knowledge 0.74 0.51 0.33 0.47 
Word Recognition 0.41 0.45 0.28 0.46 
9 227. ~—Vocabulary Knowledge 0.65 0.55 0.27 0.29 
Syntax Knowledge 0.35 0.37 0.25 0.27 
Word Recognition 0.39 0.25 0.35 0.43 
10 169 Vocabulary Knowledge 0.67 0.61 0.36 0.44 
Syntax Knowledge 0.52 0.56 0.34 0.38 
Word Recognition 0.40 0.40 0.28 0.36 
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Note. PPVT-IV = Peabody Picture Vocabulary Task — 4" Edition; GJT = Grammaticality Judgment Task, TOWRE = Test of Word 
Reading Efficiency. 


A secondary analysis of convergent validity evaluated the extent to which the correlations between the 
FRA and the PPVT-IV, GJT, and TOWRE tasks varied dependent on one’s level of ability. Because 
traditional correlations are representative of average associations, it is possible that the average does 
not best characterize relations for students with low, average, and high ability levels. For example, it is 
plausible that at low levels of the GJT, a stronger correlation exists between the GJT and the FRA Syntax 
Knowledge compared to a weaker correlation at higher levels of the GJT. Because the GIT is a clinical 
measure of syntax knowledge, it is designed for students who are supposed to be deficient in this skill. 
The GIT is not typically administered to students with average or high syntax skills; therefore, reporting 
the average correlation between scores on the GJT and the FRA Syntactic Knowledge could mask a 
stronger association for students with poor syntax skills. Typical regression models are ill-equipped to 
test for differential correlations across the range of scores for an outcome variable. Rather, quantile 
regression (Koenker & Bassett, 1978; Petscher & Logan, 2014; Petscher, Logan, & Zhou, 2013) is suitable 
to estimating the correlation between measures conditional on performance of the outcome. In this 
manner we tested the extent to which: 1) the correlation between the FRA Vocabulary Knowledge and 
PPVT-IV varied for students with low, average, and high PPVT-IV scores; 2) the correlation between the 
FRA Word Recognition and TOWRE-Real Word varied for students with low, average, and high TOWRE 
Real Word scores; 3) the correlation between the FRA Word Recognition and TOWRE Non-Word varied 
for students with low, average, and high TOWRE Non-Word scores; and 4) the correlation between the 
FRA Syntactic Knowledge and GIT varied for students with low, average, and high GJT scores. 


Figures from the quantile correlation analyses are reported in Appendices D-G. The quantile correlations 
between FRA Vocabulary Knowledge and the PPVT-IV (Appendix D) show that in general the correlations 
between the two assessments are more strongly related for students who performed lower in PPVT-IV. 
The implication is that lower performance on the PPVT-IV is correlated with low performance on the 
Vocabulary Knowledge task. At higher levels of the PPVT-IV the correlation is still moderate but less than 
that observed at the lower level of PPVT-IV. To better capture the nature of the relations between the 
variables, Table 21 provides a summary of the average correlation between the two tasks by grade for 
students who are low on the PPVT-IV (i.e., <40" quantile/percentile), average (40 -60"" 
quantile/percentile) and high (> 60" quantile/percentile). The quantile correlations demonstrate a trend 
that higher correlations between the measures are observed for students who score low or average on 
the PPVT-IV. A similar trend is generally observed for the FRA Word Recognition Task in its relation to 
the two TOWRE outcomes (Appendix E and F; Table 21) as well as for the Syntactic Knowledge Task 
(Appendix G; Table 21). 


Table 21 
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Average correlations within ranges of quantiles/percentiles by grade and task 


Quantile/Percentile Range 


FRA Task Outcome Grade <40 40-60 >60 
Vocabulary Knowledge PPVT-IV 3 0.60 0.48 0.40 
4 0.60 0.50 0.42 

5 0.66 0.67 0.52 

6 0.67 0.58 0.54 

7 0.66 0.63 0.55 

8 0.52 0.34 0.25 

9 0.72 0.56 0.51 

10 0.72 0.70 0.54 

Word Recognition TOWRE Real Word 3 0.47 0.47 0.34 
4 0.44 0.41 0.40 

5 0.19 0.19 0.16 

6 0.54 0.48 0.49 

7 0.53 0.45 0.40 

8 0.11 0.28 0.38 

9 0.31 0.29 0.43 

10 0.19 0.31 0.37 

Word Recognition TOWRE Non-Word 3 0.45 0.48 0.35 
4 0.50 0.47 0.35 

5 0.39 0.31 0.27 

6 0.57 0.38 0.36 

7 0.67 0.38 0.33 

8 0.55 0.37 0.38 

9 0.48 0.41 0.29 

10 0.52 0.33 0.18 

Syntactic Knowledge GJT 3 0.44 0.52 0.52 
4 0.66 0.58 0.58 

5 0.50 0.52 0.40 

6 0.50 0.48 0.37 

7 0.71 0.41 0.50 

8 0.70 0.48 0.30 

9 0.39 0.38 0.47 

10 0.61 0.55 0.52 


Note. PPVT-IV = Peabody Picture Vocabulary Task — 4" Edition; GJT = Grammaticality Judgment Task, 
TOWRE = Test of Word Reading Efficiency. 
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Discriminant validity. Discriminant validity was evaluated by estimating correlations between 


the FRA tasks and variables that should not be related to measures of reading: sex and birthdate (Table 


22). Results indicated that weak associations were generally observed across grade levels. 


Table 22 


Correlations between FRA tasks and birthdate/sex 


Grade Task Birthdate Sex 
3 Vocabulary Knowledge 0.10 0.11 
Word Recognition 0.09 0.08 
Reading Comprehension 0.11 0.22 

Syntax Knowledge 0.04 0.08 

4 Vocabulary Knowledge 0.16 -0.02 
Word Recognition 0.21 0.03 
Reading Comprehension 0.14 0.14 

Syntax Knowledge 0.09 0.04 

5 Vocabulary Knowledge 0.13 -0.12 
Word Recognition 0.02 -0.01 
Reading Comprehension 0.23 0.13 

Syntax Knowledge 0.17 -0.12 

6 Vocabulary Knowledge 0.26 -0.20 
Word Recognition 0.14 -0.01 
Reading Comprehension 0.28 -0.20 

Syntax Knowledge 0.23 -0.20 

7 Vocabulary Knowledge 0.01 -0.12 
Word Recognition 0.20 0.00 
Reading Comprehension 0.12 -0.06 
Syntax Knowledge 0.22 0.05 

8 Vocabulary Knowledge 0.01 -0.26 
Word Recognition 0.12 -0.13 
Reading Comprehension 0.09 0.04 
Syntax Knowledge 0.12 -0.16 

9 Vocabulary Knowledge 0.15 -0.10 
Word Recognition 0.12 -0.10 
Reading Comprehension 0.12 0.01 
Syntax Knowledge 0.18 0.12 

10 Vocabulary Knowledge 0.20 0.04 
Word Recognition 0.14 0.02 
Reading Comprehension 0.18 0.10 
Syntax Knowledge 0.20 0.17 
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Appendix A: G3-G12 Weights 


Table A1 


Population values for each grade for each of the sixteen demographic groups 


Grade 
Race FRL ELL 3 4 5 6 7 8 9 10 
White Yes Yes 0.00 0.20 0.64 0.25 0.33 0.36 0.17 0.43 
White Yes No 18.11 17.52 17.26 17.69 16.80 16.37 15.24 13.48 
White No Yes 0.09 0.30 0.09 0.13 0.00 0.07 0.25 0.14 
White No No 22.02 23.22 23.61 23.69 25.00 25.95 27.56 29.39 
Black Yes Yes 0.18 0.30 0.27 0.19 0.47 0.43 0.25 0.57 
Black Yes No 19.75 18.72 18.80 18.88 18.20 17.53 16.49 15.55 
Black No Yes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
Black No No 3.00 3.20 3.36 3.75 4.20 4.50 5.83 6.56 
Hispanic Yes Yes 6.82 5.51 7.27 7.75 7.07 6.79 2.83 3.85 
Hispanic Yes No 16.65 17.42 15.26 14.38 14.53 14.37 16.49 14.41 
Hispanic No Yes 0.18 0.20 0.18 1.00 0.40 0.79 0.67 0.86 
Hispanic No No 6.73 6.81 6.81 6.06 6.93 7.01 8.33 8.92 
Other Yes Yes 0.00 0.30 0.09 0.25 0.53 0.21 0.25 0.36 
Other Yes No 3.46 3.21 3.36 3.00 2.60 2.57 2.41 2.07 
Other No Yes 0.09 0.10 0.00 0.06 0.07 0.07 0.08 0.00 
Other No No 2.91 3.00 3.00 2.94 2.87 2.93 3.16 3.42 


Note. Not all race/ethnicity subgroups are represented due to limited information provided when evaluating interactions among (i.e., White, 
Black, Hispanic, Other), free/reduced lunch status (eligible or ineligible), and English language learner (identified or not identified). Students in 
grades 11 and 12 use the grade 10 distribution of ability scores. FRL = Free/reduced price lunch. ELL = English language learners. 
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Table A2 


Sample weight values for Reading Comprehension Task 


Grade 
Race FRL ELL 3 4 5 6 7 8 9 10 
White Yes Yes 0.00 0.77 1.16 1.09 1.22 2.00 1.13 1.65 
White Yes No 0.91 1.04 1.04 1.32 1.26 1.34 1.10 1.08 
White No Yes 0.41 1.58 1.29 0.57 0.00 0.44 1.92 1.08 
White No No 0.58 0.53 0.52 0.67 0.71 0.72 0.97 0.96 
Black Yes Yes 1.64 2.73 2.45 0.61 0.87 0.61 0.45 1.04 
Black Yes No 2.08 2.11 2.06 1.31 1.18 1.16 0.85 0.92 
Black No Yes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
Black No No 0.86 0.92 1.09 1.04 1.40 1.16 1.34 1.17 
Hispanic Yes Yes 1.93 1.92 1.96 1.41 1.39 1.42 1.04 1.23 
Hispanic Yes No 1.83 1.93 2.03 0.91 0.96 0.98 0.94 0.92 
Hispanic No Yes 0.33 0.54 0.62 1.23 0.59 1.08 1.56 1.37 
Hispanic No No 1.05 1.28 1.39 1.24 1.31 1.17 1.52 1.41 
Other Yes Yes 0.00 1.00 0.23 0.96 1.39 0.72 1.39 1.24 
Other Yes No 1.11 1.10 0.99 1.15 1.03 1.11 0.78 0.69 
Other No Yes 0.24 0.29 0.00 0.46 0.64 0.44 0.80 0.00 
Other No No 0.53 0.60 0.71 1.31 1.01 1.16 0.91 0.81 


Note. Not all race/ethnicity subgroups are represented due to limited information provided when evaluating interactions among (i.e., White, 
Black, Hispanic, Other), free/reduced lunch status (eligible or ineligible), and English language learner (identified or not identified). Students in 
grades 11 and 12 use the grade 10 distribution of ability scores. FRL = Free/reduced price lunch. ELL = English language learners. 
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Table A3 


Sample weight values for Vocabulary Knowledge Task 


Grade 
Race FRL ELL 3 4 5 6 7 8 9 10 
White Yes Yes 0.00 2.00 0.64 0.25 0.33 0.36 0.17 0.43 
White Yes No 0.69 0.67 0.66 0.89 0.87 0.72 0.69 0.77 
White No Yes 9.00 0.30 0.09 0.13 0.00 0.07 0.25 0.14 
White No No 0.84 0.81 0.82 1.06 1.01 0.90 1.01 0.89 
Black Yes Yes 0.90 1.67 1.50 0.19 2.76 2.53 0.68 0.57 
Black Yes No 1.77 2.01 1.52 1.14 1.00 1.18 1.07 1.06 
Black No Yes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
Black No No 1.00 0.96 0.97 0.65 0.72 0.93 1.44 1.08 
Hispanic Yes Yes 2.85 10.40 5.39 7.83 6.04 5.18 2.16 3.16 
Hispanic Yes No 0.93 0.88 0.89 0.56 0.71 0.71 0.87 1.02 
Hispanic No Yes 18.00 1.11 0.95 5.88 2.35 4.65 5.58 0.86 
Hispanic No No 1.35 1.55 1.36 1.41 1.66 1.74 1.88 1.31 
Other Yes Yes 0.00 3.00 0.47 0.25 0.53 0.21 2.08 0.36 
Other Yes No 0.96 1.14 1.34 1.52 1.04 1.53 0.89 0.92 
Other No Yes 9.00 0.56 0.00 0.06 0.07 0.07 0.08 0.00 
Other No No 0.86 0.71 1.20 1.37 0.95 2.90 0.95 0.85 


Note. Not all race/ethnicity subgroups are represented due to limited information provided when evaluating interactions among (i.e., White, 
Black, Hispanic, Other), free/reduced lunch status (eligible or ineligible), and English language learner (identified or not identified). Students in 
grades 11 and 12 use the grade 10 distribution of ability scores. FRL = Free/reduced price lunch. ELL = English language learners. 
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Table A4 


Sample weight values for Word Recognition Task 


Grade 
Race FRL ELL 3 4 5 6 7 8 9 10 
White Yes Yes 0.00 1.18 0.64 0.25 0.33 0.36 1.89 0.43 
White Yes No 1.71 1.63 1.60 2.45 2.23 2.45 2.82 3.56 
White No Yes 0.09 0.30 0.09 0.13 0.00 0.44 1.32 0.14 
White No No 0.52 0.51 0.54 0.55 0.49 0.50 0.59 0.48 
Black Yes Yes 0.18 0.30 0.27 0.19 2.94 0.43 2.78 6.33 
Black Yes No 0.83 0.84 0.87 0.67 0.84 1.01 0.72 1.00 
Black No Yes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
Black No No 0.32 0.38 0.29 0.36 0.41 0.39 0.48 0.60 
Hispanic Yes Yes 45.47 16.21 51.93 75.00 70.00 6.79 2.83 10.69 
Hispanic Yes No 9.05 14.64 6.63 9.59 12.01 11.98 16.49 11.44 
Hispanic No Yes 1.20 1.18 0.18 1.00 0.40 4.94 3.53 3.19 
Hispanic No No 2.58 2.66 2.37 2.82 2.83 3.92 2.74 3.96 
Other Yes Yes 0.00 0.88 0.64 1.67 1.61 0.64 0.89 0.36 
Other Yes No 1.07 1.45 2.92 1.09 1.14 1.31 1.70 2.30 
Other No Yes 0.20 0.59 0.00 0.06 0.44 0.21 0.89 0.00 
Other No No 0.57 0.53 0.55 0.83 1.17 0.49 0.51 0.92 


Note. Not all race/ethnicity subgroups are represented due to limited information provided when evaluating interactions among (i.e., White, 
Black, Hispanic, Other), free/reduced lunch status (eligible or ineligible), and English language learner (identified or not identified). Students in 
grades 11 and 12 use the grade 10 distribution of ability scores. FRL = Free/reduced price lunch. ELL = English language learners. 
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Table A5 


Sample weight values for Syntactic Knowledge Task 


Grade 
Race FRL ELL 3 4 5 6 7 8 9 10 
White Yes Yes 0.00 1.67 1.00 1.00 1.00 36.00 17.00 43.00 
White Yes No 2.39 2.14 2.27 2.31 3.23 14.36 14.65 12.96 
White No Yes 0.29 1.00 1.00 1.00 0.00 7.00 25.00 14.00 
White No No 0.50 0.47 0.43 0.39 0.37 0.33 0.37 0.38 
Black Yes Yes 0.10 0.14 0.33 0.23 2.94 43.00 2.78 57.00 
Black Yes No 0.83 0.98 1.15 1.08 1.26 2.12 1.31 1.35 
Black No Yes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
Black No No 0.46 O55 £40.51 0.59 0.58 0.73 0.89 0.93 
Hispanic Yes Yes 9.34 2.36 5.08 13.36 70.70 679.00 283.00 385.00 
Hispanic Yes No 3.27 3.83 3.32 4.61 29.65 29.33 24.98 120.08 
Hispanic No Yes 0.43 1.67 1.80 100.00 4.00 79.00 67.00 86.00 
Hispanic No No 2.31 3.89 2.67 3.50 4.28 14.31 14.61 38.78 
Other Yes Yes 0.00 2.50 0.29 2.08 3.31 1.31 25.00 36.00 
Other Yes No 1.23 1.06 2.35 1.52 1.78 1.76 4.23 9.00 
Other No Yes 0.17 0.83 0.00 0.50 0.44 0.44 0.89 0.00 
Other No No 1.12 0.99 0.89 1.16 1.61 1.39 0.88 2.28 


Note. Not all race/ethnicity subgroups are represented due to limited information provided when evaluating interactions among (i.e., White, Black, Hispanic, Other), 
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free/reduced lunch status (eligible or ineligible), and English language learner (identified or not identified). Students in grades 11 and 12 use the grade 10 distribution of ability 


scores. FRL = Free/reduced price lunch. ELL = English language learners. Note that Table A1 should be used with Tables A2 through A5. Large sample weights reflect subgroups 
which needed to be weighted more in the analyses; however, a large value does not necessarily indicate gross under-sampling. For example, Table A.5 highlights that Hispanic 
students who are FRL and ELL have large weights in grades 8-10 (e.g., 679, 283, and 385). Note also that Table A1 shows that Hispanic students who are FRL and ELL constitute 
only 6.79% of the state population in grade 8. Thus, the large sample weight reflects the need to weight the smaller sample by a factor of 679 so that it can adequately reflect 


the state population at an appropriate level. 
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Appendix B: Distribution of the Log Odds and Predicted 
Probability of Success on the SAT-10 at the 40" Percentil 
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Appendix C: Distribution of the Log Odds and Predicted 
Probability of Success on the SAT-10 at the 70" Percentile 
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Appendix D: Quantile Correlations between FRA Vocabulary Knowledge and PPVT-IV 
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Appendix E: Quantile Correlations between FRA Word Recognition and TOWRE Real 
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Appendix F: Quantile Correlations between FRA Word Recognition and TOWRE Non- 
Word 


Grade 3 Grade 4 Grade 5 Grade 6 


0.2 04 06 08 
Student Achievement 


Grade 7 Grade 8 Grade 9 Grade 10 


FRA | Appendices 


© 2014 State of Florida, Department of Education. All Rights Reserved. 


Appendix G: Quantile Correlations between FRA Syntax Knowledge and GJT 
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Appendix H: ESE Eligibility Definitions in Florida 


¢ Autism Spectrum Disorder (ASD) - Autism Spectrum Disorder is defined to be a range of 
pervasive developmental disorders that adversely affects a student's functioning and 
results in the need for specially designed instruction and related services. Autism 
Spectrum Disorder is characterized by an uneven developmental profile and a pattern of 
qualitative impairments in social interaction, communication, and the presence of 
restricted repetitive, and/or stereotyped patterns of behavior, interests, or activities. 
These characteristics may manifest in a variety of combinations and range from mild to 
severe. Autism Spectrum Disorder may include Autistic Disorder, Pervasive 
Developmental Disorder Not Otherwise Specified, Asperger’s Disorder, or other related 
pervasive developmental disorders. The corresponding definition is found in State Board 
of Education Rule 6A-6.03023, F.A.C. 


¢ Deaf or Hard-of-Hearing (DHH) - A student who is deaf or hard-of-hearing has a hearing 
loss aided or unaided, that impacts the processing of linguistic information and which 
adversely affects performance in the educational environment. The degree of loss may 
range from mild to profound. 


¢ Ages Birth - 5 Years 
o Birth Through 2 Years - A prekindergarten child with disabilities is a child who is 
below five (5) years of age on or before September 1 and has a sensory, physical, 
mental, or emotional condition which significantly affects the attainment of 
normal developmental milestones. 
= Established Conditions (EC): Ages Birth Through 2 Years Old - A child 
with an established condition is defined as a child from birth through two 
(2) years of age with a diagnosed physical or mental condition known to 
have a high probability of resulting in developmental delay or disability. 
Such conditions shall include genetic disorders, metabolic disorders, 
neurological abnormalities and insults, or severe attachment disorder. 
= Developmentally Delayed (DD): Ages Birth Through 2 Years Old - A child 
who is developmentally delayed is defined as a child from birth through 
two years of age who has a delay in one (1) or more of the following 
areas: 


1. Adaptive or self help development; 
2. Cognitive development; 
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3. Communication development; 
4. Social/emotional development; 
5. Physical/motor development 


These definitions are found in State Board of Education Rules 6A- 
6.03031, F.A.C. and 6A-6.03030, F.A.C. 


o Ages Three Through Five Years — A prekindergarten child with disabilities is a 
child who is below five (5) years of age on or before September 1 and has a 
sensory, physical, mental, or emotional condition which significantly affects the 
attainment of normal developmental milestones. 

= Developmentally Delayed (DD): Ages 3-5 Years - A child who is 
developmentally delayed is three (3) through five (5) years of age and is 
delayed in one (1) or more of the following areas: 


Adaptive or self-help development, 

Cognitive development, 

Communication development, 

Social or emotional development, 

Physical development including fine, or gross, or perceptual motor. 


ot a eS 


This definition is found in State Board of Education Rule 6A-6.03026, 
F.A.C. 


¢ Dual Sensory Impairment (DSI1): Deaf-Blind - A student who has dual-sensory 
impairments affecting both vision and hearing, the combination of which causes a 
serious impairment in the abilities to acquire information, communicate, or function 
within the environment, or who has a degenerative condition which will lead to such an 
impairment. 


¢ Emotional/Behavioral Disability (E/BD) - A student with an emotional/behavioral 
disability has persistent (is not sufficiently responsive to implemented evidence based 
interventions) and consistent emotional or behavioral responses that adversely affect 
performance in the educational environment that cannot be attributed to age, culture, 
gender, or ethnicity. The corresponding definition is found in State Board of Education 
Rule 6A-6.03016, F.A.C. 


° Gifted - 
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¢ Homebound or Hospitalized (HH) - A homebound or hospitalized student is a student 
who has a medically diagnosed physical or psychiatric condition which is acute or 
catastrophic in nature, or a chronic illness, or a repeated intermittent illness due to a 
persisting medical problem and that confines the student to home or hospital, and 
restricts activities for an extended period of time. The corresponding definition is found 
in State Board of Education Rule 6A-6.03020, Florida Administrative Code (FAC) 


¢ Intellectual Disability (InD) - An intellectual disability is defined as significantly below 
average general intellectual and adaptive functioning manifested during the 
developmental period, with significant delays in academic skills. Developmental period 
refers to birth to eighteen (18) years of age. 


¢ Language Impairment (LI) - Language impairments are disorders of language that 
interfere with communication, adversely affect performance and/or functioning in the 
student’s typical learning environment, and result in the need for exceptional student 
education. A Language impairment is defined as a disorder in one or more of the basic 
learning processes involved in understanding or in using spoken or written language. 
These include: 


1. Phonology — Phonology is defined as the sound systems of a language and the 
linguistic conventions of a language that guide the sound selection and sound 
combinations used to convey meaning; 

2. Morphology — Morphology is defined as the system that governs the internal 
structure of words and the construction of word forms; 

3. Syntax — Syntax is defined as the system governing the order and combination of 
words to form sentences, and the relationships among the elements within a 
sentence; 

4. Semantics — Semantics is defined as the system that governs the meanings of words 
and sentences; and 

5. Pragmatics — Pragmatics is defined as the system that combines language 
components in functional and socially appropriate communication. 


The language impairment may manifest in significant difficulties affecting listening 
comprehension, oral expression, social interaction, reading, writing, or spelling. A 
language impairment is not primarily the result of factors related to chronological age, 
gender, culture, ethnicity, or limited English proficiency. This definition is found in State 
Board of Education Rule 6A-6.030121, F.A.C. 
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¢ Other Health Impairment (OHI) - Other health impairment means having limited 
strength, vitality or alertness, including a heightened alertness to environmental stimuli, 
that results in limited alertness with respect to the educational environment, that is due 
to chronic or acute health problems. This includes, but is not limited to, asthma, 
attention deficit disorder or attention deficit hyperactivity disorder, Tourette syndrome, 
diabetes, epilepsy, a heart condition, hemophilia, lead poisoning, leukemia, nephritis, 
rheumatic fever, sickle cell anemia, and acquired brain injury. This definition is found in 
State Board of Education Rule, Florida Administrative Code (F.A.C.). 


¢ Orthopedic Impairment (Ol) - Orthopedic impairment means a severe skeletal, 
muscular, or neuromuscular impairment. The term includes impairments resulting from 
congenital anomalies (e.g. including but not limited to skeletal deformity or spina 
bifida), and impairments resulting from other causes (e.g., including but not limited to 
cerebral palsy or amputations). This definition is found in State Board of Education Rule 
6A-6.030151, F.A.C. 


¢ Specific Learning Disability (SLD) - A specific learning disability is defined as a disorder in 
one or more of the basic learning processes involved in understanding or in using 
language, spoken or written, that may manifest in significant difficulties affecting the 
ability to listen, speak, read, write, spell, or do mathematics. Associated conditions may 
include, but are not limited to, dyslexia, dyscalculia, dysgraphia, or developmental 
aphasia. A specific learning disability does not include learning problems that are 
primarily the result of a visual, hearing, motor, intellectual, or emotional/behavioral 
disability, limited English proficiency, or environmental, cultural, or economic factors. 
This definition is found in State Board of Education Rule 6A-6.03018, F.A.C. 


¢ Speech Impairment (SI) - Soeech impairments are disorders of speech sounds, fluency, 
or voice that interfere with communication, adversely affect performance and/or 
functioning in the educational environment, and result in the need for exceptional 
student education. 


1. Speech sound disorder — A speech sound disorder is a phonological or articulation 
disorder that is evidenced by the atypical production of speech sounds characterized 
by substitutions, distortions, additions, or omissions that interfere with intelligibility. 
A speech sound disorder is not primarily the result of factors related to chronological 
age, gender, culture, ethnicity, or limited English proficiency. 

1. Phonological disorder — A phonological disorder is an impairment in the 
system of phonemes and phoneme patterns within the context of spoken 
language. 
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2. Articulation disorder — An articulation disorder is characterized by difficulty 
in the articulation of speech sounds that may be due to a motoric or 
structural problem. 

2. Fluency disorder — A fluency disorder is characterized by deviations in continuity, 
smoothness, rhythm, or effort in spoken communication. It may be accompanied by 
excessive tension and secondary behaviors, such as struggle and avoidance. A 
fluency disorder is not primarily the result of factors related to chronological age, 
gender, culture, ethnicity, or limited English proficiency. 

3. Voice disorder — A voice disorder is characterized by the atypical production or 
absence of vocal quality, pitch, loudness, resonance, or duration of phonation that is 
not primarily the result of factors related to chronological age, gender, culture, 
ethnicity, or limited English proficiency. 


This definition is found in State Board of Education Rule 6A-6.03012, F.A.C. 


¢ Traumatic Brain Injury (TBI) - A traumatic brain injury means an acquired injury to the 
brain caused by an external physical force resulting in total or partial functional 
disability or psychosocial impairment, or both, that adversely affects educational 
performance. The term applies to mild, moderate, or severe, open or closed head 
injuries resulting in impairments in one (1) or more areas such as cognition, language, 
memory, attention, reasoning, abstract thinking, judgment, problem-solving, sensory, 
perceptual and motor abilities, psychosocial behavior, physical functions, information 
processing, or speech. The term includes anoxia due to trauma. The term does not 
include brain injuries that are congenital, degenerative, or induced by birth trauma. 


This definition is found in State Board of Education Rule 6A-6.030153, F.A.C. 


¢ Visual Impairment (VI): Blind and Partially Sighted - Students who are visually impaired 
include students who are blind, have no vision, or have little potential for using vision or 
students who have low vision. The term visual impairment does not include students 
who have learning problems that are primarily the result of visual perceptual and/or 
visual motor difficulties. 


The corresponding definition is found in State Board of Education Rule 6A-6.03014, 
Florida Administrative Code (F.A.C.). 
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